Primitive Adaptive Critics

نویسندگان

  • Danil V. Prokhorov
  • Lee A. Feldkamp
چکیده

We propose a simple framework for critic-based training of recurrent neural networks and feedback controllers. We term the critics that are used primitive adaptive critics, since we represent them with the simplest possible architecture (bias weight only). We derive this framework from two main premises. The first of these is a natural similarity between a form of approximate dynamic programming, called Dual Heuristic Programming (DHP), and backpropagation through time (BPTT), as we will discuss. The second premise is our emphasis on a development of a truly online critic-based training procedure competitive in performance and computational cost to truncated BPTT. Three examples illustrate the main features of the framework proposed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model-Building Adaptive Critics for Semi-Markov Control

Adaptive (or actor) critics are a class of reinforcement learning algorithms. Generally, in adaptive critics, one starts with randomized policies and gradually updates the probability of selecting actions until a deterministic policy is obtained. Classically, these algorithms have been studied for Markov decision processes under model-free updates. Algorithms that build the model are often more...

متن کامل

Observations on the Practical Use of Adaptive Critics

By studying adaptive critic designs (ACD) from the standpoint of practical use in training neural networks , we expect to establish the types of problems for which ACD might be preferable to more established methods. To restrict the scope, we have chosen to concentrate on applying ACD, speciically derivative critics, to the training of recurrent networks 1]. This is actually less restrictive th...

متن کامل

Using Greedy Randomize Adaptive Search Procedure for solve the Quadratic Assignment Problem

  Greedy randomize adaptive search procedure is one of the repetitive meta-heuristic to solve combinatorial problem. In this procedure, each repetition includes two, construction and local search phase. A high quality feasible primitive answer is made in construction phase and is improved in the second phase with local search. The best answer result of iterations, declare as output. In this stu...

متن کامل

Greedy Adaptive Critics for LQR Problems: Convergence Proofs

A number of success stories have been told where reinforcement learning has been applied to problems in continuous state spaces using neural nets or other sorts of function approximators in the adaptive critics. However, the theoretical understanding of why and when these algorithms work is inadequate. This is clearly exempliied by the lack of convergence results for a number of important situa...

متن کامل

Reinforcement Learning in Markovian and Non-Markovian Environments

This work addresses three problems with reinforcement learning and adap-tive neuro-control: 1. Non-Markovian interfaces between learner and environment. 2. On-line learning based on system realization. 3. Vector-valued adaptive critics. An algorithm is described which is based on system realization and on two interacting fully recurrent continually running networks which may learn in parallel. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997